Sharing data and work across queries in analytical workloads

نویسنده

  • Iraklis Psaroudakis
چکیده

Traditionally, query execution engines in relational databases have followed a query-centric model: They optimize and execute each incoming query using a separate execution plan, independent of other concurrent queries. For workloads with low contention for resources, or workloads with short-lived queries, this model makes the optimization phase faster and creates efficient execution plans. For workloads with heavy contention, or workloads with long-running analytical queries, this model cannot exploit the sharing opportunities that might exist among concurrent queries in order to save I/O, CPU and RAM resources. We argue that exploiting these sharing opportunities is a crucial step towards handling these increasingly common workloads. In this paper, we study three research prototype systems that employ various methodologies for sharing data and work: (a) The QPipe query execution engine [1], which employs a circular scan per table and shares work through simultaneous pipelining, (b) the DataPath system [2], which employs an uninterrupted linear scan per disk and shares work through a global query plan, and (c) the SharedDB system [3], which employs a circular scan per table partition, shares work through a global query plan, uses batched execution, and services both OLTP and OLAP workloads under response time guarantees. We classify these methodologies, analyze their commonalities and differences, and identify their strengths and shortcomings.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Shared Execution of Recurring Workloads in MapReduce

With the increasing complexity of data-intensive MapReduce workloads, Hadoop must often accommodate hundreds or even thousands of recurring analytics queries that periodically execute over frequently updated datasets, e.g., latest stock transactions, new log files, or recent news feeds. For many applications, such recurring queries come with user-specified service-level agreements (SLAs), commo...

متن کامل

Sharing Data and Work Across Concurrent Analytical Queries

Today’s data deluge enables organizations to collect massive data, and analyze it with an ever-increasing number of concurrent queries. Traditional data warehouses (DW) face a challenging problem in executing this task, due to their query-centric model: each query is optimized and executed independently. This model results in high contention for resources. Thus, modern DW depart from the queryc...

متن کامل

Improvement of the Analytical Queries Response Time in Real-Time Data Warehouse using Materialized Views Concatenation

A real-time data warehouse is a collection of recent and hierarchical data that is used for managers’ decision-making by creating online analytical queries. The volume of data collected from data sources and entered into the real-time data warehouse is constantly increasing. Moreover, as the volume of input data to the real time data warehouse increases, the interference between online loading ...

متن کامل

MiniTasking: Improving Cache Performance for Multiple Query Workloads

This paper proposes a novel idea, called MiniTasking to reduce the number of cache misses by improving the data temporal locality for multiple concurrent queries. Our idea is based on the observation that, in many workloads such as decision support systems (DSS), there is usually significant amount of data sharing among different concurrent queries. MiniTasking exploits such data sharing charac...

متن کامل

MQJoin: Efficient Shared Execution of Main-Memory Joins

Database architectures typically process queries one-at-a-time, executing concurrent queries in independent execution contexts. Often, such a design leads to unpredictable performance and poor scalability. One approach to circumvent the problem is to take advantage of sharing opportunities across concurrently running queries. In this paper we propose Many-Query Join (MQJoin), a novel method for...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012